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Abstract 

This paper considers the problem of estimating a periodic function 
in a continuous time regression model with an additive stationary gaus- 
sian noise having unknown correlation function. A general model se- 
lection procedure on the basis of arbitrary projective estimates, which 
does not need the knowledge of the noise correlation function, is pro- 
posed. A non-asymptotic upper bound for £2-risk (oracle inequality) 
has been derived under mild conditions on the noise. For the Ornstein- 
Uhlenbeck noise the risk upper bound is shown to be uniform in the 
nuisance parameter. In the case of gaussian white noise the constructed 
procedure has some advantages as compared with the procedure based 
on the least squares estimates (LSE). The asymptotic minimaxity of 
the estimates has been proved. The proposed model selection scheme 
is extended also to the estimation problem based on the discrete data 
applicably to the situation when high frequency sampling can not be 
provided. 
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1 Introduction 



Consider a regression model in continuous time 

dy t = S(t)dt + d& , (1) 

where S(t) is an unknown 1-periodic function in the space £2 [0,1], (&)t>o 
is a continuous gaussian process with zero mean and such that for each 
n > 1 the stochastic integral JJ 1 f(t)d£t is well-defined for any non-random 
function / from ^[0, w]. The correlation function of noise is unknown. 
This process can be modeled in different ways. 

Example 1. £ t is a scalar non-explosive Ornstein-Uhlenbeck process defined 
by the equation 

dC t = eC t dt + dw t , (2) 

where (w t ) t>0 is a standard brownian motion and 9 < is unknown param- 
eter; the initial value £ ~ A/"(0, 1/216*1) if 6 < and £ = if = 0. 

Example 2. £ t is a stationary autoregressive process of order q > 2 satis- 
fying the stochastic differential equation 

= e^- x) + . . . + + . (3) 

Here (w t ) t>0 is a white gaussian noise and the unknown vector 9 = (9 1 , . . . , 9 )' 
belongs to stability region of the process 

A = {9 G R" : max Re\{9) < 0} , (4) 

l<i<q 

where (\(9)) 1<i<q are eigenvalues of the matrix 

I q is the identity matrix of order q. 

Models of type ([TJ and their discrete-time analogues have been stud- 
ied by a number of authors (see, Efroimovich (1999), Liptser and Shyraev 
(1974), Konev and Pergamenshchikov (2003), Nemirovskii (2000) and ref- 
erences therein). The estimation problem of periodic signal S(t) in model 
CE])-© has been thoroughly studied in the case, when (Ct)t>o is a white 
gaussian noise (see, for example, Ibragimov and Hasminskii (1981) for de- 
tails and further references). 
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A discrete-time counterpart of model (pQ)-([2]) was applied in the econo- 
metrical problems for modeling the consumption as a function of income 
Golfeld and Quandt (1972). 

As is well known, the problem of nonparametric estimation of S(t) com- 
prises the following three statements: the function estimation at a fixed 
point to, estimation in the uniform metric and in the integral metric. The 
first two problems are usually solved by making use of the kernel and lo- 
cal polynomial estimates. This paper focuses on the third setting with the 
quadratic metric. The estimation in the integral metric is based, as a rule, 
on the projective estimates which were first proposed in Chenstov (1962) for 
estimating the distribution density in a scheme of i.i.d. observations. The 
heart of this method is to approximate the unknown function with a finite 
Fourier series. Applying the projective estimates to the regression model 
(prj with a white noise leads to the optimal convergence rate in L2(0, 1) pro- 
vided that the smoothness of S is known (see for example Ibragimov and 
Hasminskii (1981)). Another adaptive approach based on the model selec- 
tion method (see for example, Barron et al. (1999), Baraud (2000), Birge 
and Massart (2001) and Fourdrinier and Pergamenshchikov (2007)) enables 
one to study this problem in the nonasymptotic setting when the smoothness 
of function S is unknown. It should be noted that this method can be used 
also for model ([1]) under the condition that the correlation function E£t£<j is 
exactly known and besides the unknown function S belongs to the subspace 
spanned by its eigenfunctions (see, Theorem 1, p. 11 in Birge and Massart 
(2001)). In our case, when the noise correlation function is unknown, this 
method can not be applied. This paper develops a general model selection 
method for the regression scheme ([!]) with unknown correlation properties. 

Note that the usual nonasymptotic selection model procedure proposed 
in Barron et al. (1999), Baraud (2000), Birge and Massart (2001) is based 
on the least square estimators (LSE) which, as was shown in Goloubev 
(1982) and Pinsker (1981), are not efficient in the problem of nonparametric 
regression. Our approach is close to the general model selection method pro- 
posed in Fourdrinier and Pergamenshchikov (2007) for discrete time models 
with spherically symmetric errors which allows one to use any projective 
estimators in the model selection procedure including the LSE. In Section 
2 we propose a general model selection procedure for a regression scheme 
in continuous time (pQ) with unknown correlation structure of the gaussian 
noise. In Theorem [H under some loose conditions on the noise, we establish 
a nonasymptotic upper bound for the quadratic risk in which the principal 
term is minimal over the set of all admissible basic estimates. The inequal- 
ities of this type are usually called oracle. 
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In the case of the Ornstein-Uhlenbeck noise ([2]), the risk upper bound is 
shown to be uniform in the nuisance parameter (Corollary [2]). 

The rest of the paper is organized as follows. In Section 3 we consider 
case of white gaussian noise £t and show that the possibility to choose dif- 
ferent projective estimators in the procedure may lead to a sharper upper 
bound for the mean square estimation accuracy. In Section 4 the upper 
bound and the lower bounds for the minimax quadratic risk are obtained 
under the assumption that the smoothness of S is unknown. In Section 6 we 
consider the estimation problem for the regression model (pQ) assuming that 

it is accessible for observations only at discrete times t\~ = k/p, k = 0, 1, 

Such observation scheme is more appropriate in a number of applications, 
where one can not provide high frequency data sampling. Theorems [5] est ab- 
lishs the nonasymptotic oracle inequalities in this case. Appendix contains 
some technical results. 



2 Nonasymptotic estimation 

In this section we consider the estimation problem for the model ([I]) in 
nonasymptotic setting, i.e. assuming that the estimator of S is based on the 
observations {yt)o<t<n with a fixed duration n. For this we apply the general 
model selection approach proposed in Fourdrinier and Pergamenshchikov 
(2007) for the discrete-time regression model. 

First we introduce some notations. Let X be the Hilbert space of square 
integrable 1-periodic functions on M. with the usual scalar product 

(x,y) = [ x(t)y(t)dt 
Jo 

and (4>j)j>i be a system of orthonormal functions in X, i.e. {<))%, 4>j) = 0, if 
i / j and ||0j|| 2 = fc) = 1. 

Then we impose the following additional conditions on the noise (£t)t>o 
in (pQ). Assume that 

Ci) For each n > 1 and k > 1 the vector C( n ) = (Ci( n )> • • • > Ck( n ))' w ^h 
components 

C» = 4= f ' MQttt (6) 

v n Jo 

is gaussian with non-degenerate covariance matrix B k n = E((n)('(n). 
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C2) The maximal eigenvalues of matrices B k n satisfy the following inequal- 
ity 

sup sup \ max (B k ) < A* , 

k>l n>l 

where A* is some known positive constant. 

Processes ([2]) and ([3]) in Examples [IH2J as is shown in Lemmas EHSJ 
satisfy condition CjJ. Condition C 2 ) is satisfied for process (|2|) with A* = 2. 
Condition C 2 ) holds also for process (|3|) provided that the value of vector 9 
belongs to the following compact set 

K s = \e G A : max ReA,-(0) < -5, \A{9)\ < S' 1 } , (7) 

where < 5 < 1 is a known constant; | • | stands for the euclidean norm of 
matrix. Under this assumption process ([3]) satisfies condition C 2 ) with 

X * = x*(5) = ^F*(5)J*{5), (8) 

where 

Let N be the set of positive integer numbers, i.e. N = {1, 2, . . .}. Denote by 
M. some finite set of finite subsets of N and by (T> m ) meM a family of linear 
subspaces of X such that 

V m = {xeX : x = >^j<f>j , Aj e E} . 

Let d m = dimP m be the number of elements in a subset m. Denote by 
S m the projection of S on D m , i.e. 

S m = '^2 o-j^j > «j = <A?) • (9) 

To estimate the function S in (|TJ) we will apply a general model selection 
approach. It requires first to choose some class of projective estimators S m 
of S m , which may be any measurable functions of observations (yt)o<t<n 
taking on values in T> m . For example, one can take the LSE S m of S, which 
is the minimizer, with respect to x G T> m , of the quantity 

1 f n 

ln (x) = \\x\\ 2 -2- x(t)dy t (10) 
n Jo 
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and has the form 

S m = ^2aj<j)j, aj = - 4>j(t)dy t . (11) 

jgm 

Let (l m ) m £_\4 De a sequence of prior weights such that l m > 1 for all m G At. 
We set _ 

r = z~ lmdm ■ (12) 

Further one needs a penalty term on the set Ai. We take it in the form 
suggested in Birge and Massart (2001). We define the penalty term as 

P( rn ) = p l n^n W ith /0 =4A*^2— , (13) 
n z # -l 

where is the maximal root of the equation In z = z — 2 which is approxi- 
mately equal to z^ » 3, 1462. 

Minimizing the penalized empirical contrast ^ n {S m )+P n {m) with respect 
to m 6 Ai one finds 

fh = argmin mg _ M {7 n (S' m ) + P n (m)} (14) 

and obtains the model selection procedure corresponding to a specific 
class of projective estimators (5 , m ) mg _ A/1 . For the LSE family (S m ) meM , this 
yields Sf^ with 

fh = argmin me _ M {7 n (S ; m ) + P n (m)} . (15) 
Our first result is the following. 

Theorem 1. Assume that the conditions Ci)-C2) are fulfilled for the noise 
in (HJ). Then for any class of projective estimators (S m ) meM the general 
model selection procedure satisfies the following oracle inequality 

E 5 ||^-S|| 2 < inf a m (S) (16) 

where E s denotes the expectation with respect to the distribution of ([1]) given 
S, 

a m (S) = 3E s \\S m - S\\ 2 + 16X*z^^lhll and Tq - u * 



n u z, - 1 
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The proof of Theorem [T] is given in the Appendix. 

Remark 1. It will be noted that the choice of the coefficient p in the penalty 
term (j X 3 j) , as will be shown in the proof of the theorem, provides the minimal 
value of the principal term a m (S). 

Now we will find the upper bound (|16p for the LSE model selection 
procedure Sf^ defined by (fTT|) and (fT5|) . To this end we have to calculate 
the accuracy of S m for any m G M. We have 

Eg \\S m — S\\ 2 = \\S m — S\\ 2 + E s \\S m — S m \\ 2 

= \\S m -S\\ 2 + Y,V s (a J -a J ) 2 , 

where S m is given in ([9]). Moreover, the condition C 2 ) yields 



1 / l' n \ 2 A 

E s (Sy -a J ) 2 = ^E 5 ^(i)d&) <- 



Therefore 



E S ll^m ~~ S\\ 2 — ll'S'm ~ -S"!! 2 + 



Thus, we obtain the following result. 

Corollary 1. Under the conditions C x ) and C 2 ) f/ie model selection 
procedure Sf^, defined by (jlip and (|15p . satisfies the inequality 

E 5 ||S m - S\\ 2 < inf a m (£) + ^ , (17) 

mG.M n 



where 



a m (S) = 3\\S m - S\\ 2 + r x A* , n = 3 + 16z, 

n 



Consider the upper bound in (fTT|) in more detail for the model (JTJ)- (J21) • 

Corollary 2. For i/ie model (H])-© i/ie LS'-E model selection procedure S^, 
defined by (jlip and (|15p wii/i A* = 2 satisfies, for any 9 < 0, the inequality 



E s ||^-S|| 2 < inf b m (S) + ^, (18) 



where r is given in (fTBT) . 

6 m (5) = 3||5 m -5|| 2 + 2r 1 ^. 

n 

Remark 2. It wn/Z 6e noted that for the model (dJ-© the LSE model selec- 
tion procedure satisfies the oracle inequality uniformly in the nuisance pa- 
rameter 9 including the boundary of the stationarity region of the Ornstein- 
Uhlenbeck process, i.e. 9 = 0. 
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3 The improvement of LSE. 

In this section we consider a special case of the model (P)-© when 9 = 0, 
i.e. 

dy t = S(t) dt + dw t . (19) 

By applying the improvement method proposed in Fourdrinier and Perga- 
menshchikov (2007) we will show that the upper bound in the oracle in- 
equality can be lessened by a proper choice of the projective estimators. Let 
us introduce a class of estimators of the form 

S* m (t) = S m (t) + * m (S m )(t) . (20) 

Here ^ m is a function from M. d into T> m , i.e. 

* m (x)(t) = v j( x ) <M*) . x G Rd ( 21 ) 

and (uj(O)jem are ^ ~ * ^ functions such that E 5 u|(a) < oo, where 

a = (2j)j gm is the vector with the components ay defined in ([TT]) . The 

functions v(x) = (vj(x))j €m will be specified below. Let 

A m (S) = E_5 \\Sm — S m \\ 2 — E s \\S m — S m \\ 2 . (22) 
It is easy to check that 

A m (S) = 2B S (* m , S m - S m ) + E s ||tf m || 2 . (23) 
This function can be found explicitly for the model (|19|) . 

Lemma 1. Let be defined by (I20p - (l2ip with continuously differentiable 
functions Vj such that E s v 2 (a) < oo. Then A m (S) = E s L(a), where 

2 

L(x) = -divv(x) + \\v(x)\\ 2 . (24) 
n 

Proof. From (HI]), |J20j>. one has 

1 f n 1 f n 

a i = ~ / ^jt^&Vt = ocj H / 4>j(t)dw t , 

n Jo n Jo 

where ay = Jq (j>j(t)S(t) dt. Therefore the vector a = (ay)„- em has a normal 
distribution N{a,n~ l I d ), where a = (aj)iem anc ^ ^ * s * ne un ^ matrix of 
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order d. This enables one to find the explicit expression for the first term 
in the right-hand side of ([23]) . Indeed, 

J = E 5 (* m , S m - S m ) = E s ^2 vj(a) (ctj - a*j) 

= E s (v(ct) , a — a) = / (v (u) , u — a) g(\\u — a\\ 2 ) du , 

where 

g(a)= ^f 2 e ^. (2S ) 
Making the spherical changes of the variables yields 



J = I I (v(u) ,u-a) v r>d (du) g(r 2 ) dr 
i 

(v(u) , e{u)) v r d (du) r g(r 2 ) dr . 



where S rd = {u G W 1 : \\u — a\\ = r}, f r( ^(-) is the superficial measure on 
the sphere S r d and e(u) = (u — a)/\\u — a\\ is a normal vector to this sphere. 
By applying the Ostrogradsky-Stokes divergence theorem we obtain that 

J= / divv (u) dur g(r 2 ) dr 

r,d 

with B rd = {u € M. d : \\u — a\\ < r}. By the Pubini theorem and the 
definition of g in ([23]) one gets 

Iff 00 
J = — / g(a) dadiv v(u) du 

2 J R d J\\ u - a \\2 

= — g(\\u — ck 1 1 2 ) divv(u) du = — E 9 divf(2). 
n J Rd n 

This leads to the assertion of Lemma [TJ □ 
In particular for d m > 2, if one takes 



{d m -2)u (d m -2) 



2 



v(u) = r — — , then L(u) 



n||M|| 2 ' ' ' n 2 ||u|| 2 



and hence, A m (S) < 0, that is, the estimate (j20|) outperforms the least 
squares esimate (jlip in the approximation of S m . This allows to improve 
the model selection procedure by making use of the estimates (|20p instead 
of the least squares {S m }. As a direct consequence of Theorem [TJ one obtains 
the following result for the improved model selection procedure S* , . 
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Theorem 2. For the model (|19p the improvement model selection procedure 
Si]^* defined by {1$ with S m = and A* = 2 satisfies the inequality 

E s \\S* m *-S\\ 2 < inf <(5) + ^, (26) 

where r is given in (fT6~|) , 

t4(5) = 3E s ||5^-5|| 2 + 32z,^. 



4 Asymptotic estimation 
4.1 The risk upper bound 

In this section we consider the asymptotic estimation problem for the model 
(PQ). To this end, we additionally assume that all functions in the orthonor- 
mal system {4>j)j>i are 1-periodic and the unknown function S{ ) in the 
model (HJ) belongs to the following functional class 

Q 0r = {SeC(R)nX : maxre 2/ \ n (,S) < r 2 } . (27) 

n>l 

Here C(R) denotes the set of all continuous R — > R functions and 

oo 

Sn(S)=J2 s h ( 28 ) 

j=n 

where (sj)j >1 are the Fourier coefficients for the basis {4>j)j>\i i-e. 

Sj = (S,^) = [ 5(t)^-(t)dt; 

JO 

ft > and r > are unknown constants. 

Similarly to Galtchouk and Pergamenshchikov (2006) we define now the 
risk for an estimator S n (a measurable function of the observation (yt)o<t<n 
in ©) as follows 

Kn{S n ,P) = sup sup E SQ \\u n (S n - S)\\ 2 , (29) 

see p>r Qev K 

g 

where cj n = u) n {p) = n 2 ' 3 + 1 . Here T 3 ,, is some class of distributions Q (in 
the space C[0, +oo)) of the noise process (£ t ) t > satisfying conditions C-J 
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and C 2 ) with A* = X*(Q) < k < oo for some known fixed parameter k. In 
addition, this class is assumed to include the Wiener distribution Q Q . The 
second index in E^g denotes that the expectation is taken with respect to 
the distribution of the process ([1]) corresponding to the noise distribution 
Q. 

Note that for the model ([I])-© V K is the class of distributions of the 
processes ([2]) with 8 < 0. In this case k = 2. For the model CE])-(j3]) 

r K = Q s u{Q }, 

where Q s is the family of distributions of the processes ([3]) of order q > 2 
with the parameters belonging to the set ([7]) for some < 5 < 1. In this 
case k = max(2, A*(<5)), where A*(<5)) is given in (J8]). 

We will apply the ordered variable model selection procedure (see, Bar- 
ron et al. (1999), p. 315), for which M. = {mi,... ,m n } with m 8 = 
{1, . . . , i}, therefore d mi = i. Then 

i 

V m = {x G X n : x = ^2 a j <A? ) a j e ^} • 
i=i 

For the ordered variable model selection procedure one can take l m = 1 for 
all m > 1 and find 

n 



I* = Ye~ d ^ < — 
^ ~ e - 1 



In the sequel we denote by the LSE model selection procedure (fTT|) . (fT5j) 
replacing A* by k. Now we will show that the risk (I29p for this procedure is 
finite. 

Theorem 3. The estimator S~. satisfies the following asymptotic inequality 

limsup sup Tl n (S^,P) < oo . (30) 
n-»oo /3>o 

Proof. Taking into account (I17p one gets, for any Q G 7\, 

E 5 oll^-S|| 2 < inf f3||S m -S|| 2 + tik^ +^ 

< inf (3\\S m _-S\\ 2 + t 1 k^-) + ^°. 

i<i<n V m * n In 
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Further, for any function S from @p r , one has 

oo 

j=i+i 

Therefore for each 1 < i < n 

Q(zP K \ nj n 

Substituting i = i n = [n^] + 1 leads to fl30J). □ 

4.2 The risk lower bound 

Now we study the lower bound for the risk (|29p . We assume that the or- 
thogonal system (0,),->i in (|2"7|) is trigonometric, i.e. 

4>i{x) = 1, and for j>2 <^-(x) = \/2 Tr i (27r[j'/2]x) , (31) 

where Tr^x) = cosx for even j > 1 and Tr,-(a;) = sin 2; for odd j > 1; [a] is 
the integer part of a. 



Theorem 4. T/ie lower bound of the risk (|29p over oZZ estimates is strictly 
positive, i.e. for any j3 > 1 

lim^inf^^,/?) >0. (32) 



Proof. In order to show (|32p it suffices to check this inequality for the model 
(fT9l) . i.e. that for any (3 > 1 

linin^oo inf sup B StQo \\u; n (S n - S)\\ 2 > 0. (33) 

To this end we apply the method proposed in Fourdrinier and Pergamen- 
shchikov (2007) to our case. First, we construct an auxiliary parametric class 
of functions in the set &g r - Let j3 = k + a with k = [j3] and < a < 1. Let 
V(-) be k + 1 times continuously differentiable function such that V(u) = 

for \u\ > 1 and j 1 l y 2 (u) du = 1. Let m = [n 2/3+1 ] and be a cube in K m 
of the form 

T s = {z = (zx,...,z m )' £R m : |zi| <<y, l<J<m}, 
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where 5 = v/u) n , v > 0. Now, viewing the function V(-) as a kernel, one 
introduces a parametric class of 1-periodic functions (S z ) z£ y s where 



S z (t) = Y^ zrfjV), 0<t<l; 
3=1 

{ipj(t))i<j< m are 1-periodic functions defined on the interval [0, 1] as 



(34) 



^(t) = y f Lgij with ft = l/2m and aj = (2j - I) /2m. 
It will be observed that, for < i < k — 1 and z €Tg, 



v 



sup \Sf{t)\ < 2* sup \V®(a)\ (0 _ i)/(20+1) 

0<t<l |o|<l 



— > as n — y oo . 



In order to check the second condition in (1621) . we estimate the increment of 
A;th derivative of S z (-). For any < s , t < 1 and z S Tj, one has 



|5<*>(f) - 4 fc )( s )| < 



^ m ' 



(35) 



where 



3=1 



ft 



s — a. 



If s and t belong to the same interval, that is, a - — h < s < t < a 1n + ft 



then putting V* = sup| a | <;l (a) | one obtains 



30 



ft 



ft 



2ft ~ ft Q 
for each < a < 1. If s and i belong to different intervals, that is, 

a jo - ft < s < a jo + ft < a h -h<t<a h +h, j < ji , 

then setting s* = a Q + ft and i* = — ft, similarly to (f36|h one gets 

't — a. 



(36) 



A, 



ft 

s — a 



v (k) I 1 * a n 



30 



< 2 i - q i/*-^ (it-**r + is-sT) < t- y* \t - s\ a . 



ft 
1 



ft 

s' — a 



30 



ft 



2-a 



1 



ft' 



ft Q 
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From here and (I35D-(I36D we come to the estimate 



\Sjp(t) - Si k \s)\ < 2 2+k uV* — \t-s\ a . 

Therefore (see Lemma [5] in Appendix 16. 4p there exist v > and n > 1 
such that S z 6 Qp^ for all z £ T$ and n > n Q . Further, we introduce the 
prior distribution on with the density 



m ^ 



'U 

1T{Z) = TT(Z 1 ,...,Z 



The function G(u) = G^e for \u\ < 1 and G(u) = for \u\ > 1, where 

G^ is a positive constant such that J_ 1 G(tt) du = 1. 

Let S n (-) be an estimate of S(-) based on observations (yt)o<t<n ^ n ©• 
Then for any n > n we can estimate with below the supremum in f)33[) as 

SU P E s,Q \\Sn ~ S\\ 2 > / B SziQo \\S n - S z \\ 2 ir(z)dz. 
Moreover, by the definition of S z , we obtain 

m „\ m 

\\S n ~ S z \\ 2 > V(2* ~ Z lf / VfWdt = h " Zl f , 

Z=l ■ / ° 2=1 

where z\ = S n (x)ipi(x)dx/\\ipi\\ 2 . Therefore, 

m 

sup E StQ JS n -S\\ 2 >hJ2^, (37) 



where A; = Eg (zi — zi) 2 i:{z)dz. To apply now lemma[6]we note that in 
this case 

Q(z) = / Mt)dy t - / S z (t)Mt)dt 



and, therefore, A[ = Eg Q Q Cf{ z ) = Jq V'fWdi = nh. Moreover, in this 
case 

Bl = f S M^!dn = 5' 2 I G ; I G = 8 C u 2 (l - u 2 y 4 G(u) du . 
J-s M u ) ' ' Jo 
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Thus, by the inequality ([66]) . one obtains that 



1 m 1 
S ^ E ^" 5n " 5f - 2^ E n/i + o^-*J G 



1 



" 2n/i + 2ulu- 2 I G ' 
This immediately implies (|33p . Hence Theorem |U □ 



5 Estimation based on discrete data. 

The model selection procedure developed in Section 2 is intended for con- 
tinuous time observations. However, in a number of applied problems high 
frequency sampling can not be provided. In this section, we consider the 
estimation problem for model ([TJ on the basis of observations (l/t )o<j<np °f 
the process (y t ) t>0 at discrete times i.- = j/p, where p is a given odd num- 
ber. To solve this problem, we will modify the model selection procedure of 
Section 2. Let X be the set of all 1-periodic functions x : K — > R with the 
scalar product 

1 P 

3=1 

Let (4>j)i<j< p be an orthonormal basis in X , i.e. (4>i,(J)j) p = 0, if i ^ j 
and ||0j||p = 1. One can use, for example, the trigonometric basis (|31|) . 

Assume that the noise (£t) t > in CD is such that 
C*) The vector (*(n) = (£*(n), . . . , (*(n))' with components 

W = i E A ^ 3 , = e t , - c*,., , (39) 

is gaussian with non-degenerate covariance matrix -B* p = E(*(n)((*(n))'; 
C?j) The maximal eigenvalues of matrices -B* p are uniformly bounded : 

sup sup A max (5* jp ) < A*, 

n>l p>l 

where A* is some iaiown positive constant. 

Conditions C^), C^) are satisfied for processes (J2J) , ([3]) (cf. Lemmas [2]- 

|5DJ). 
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Now we denote by Ai p some set of subsets of {1, . . . ,p} and by (J) m , P )meM 
a family of linear subspaces of X p such that 

V m , p = {x G X p : x = > Ai G • 

Let S mj) denote the projection of S on D m in <Y p and S mp denote an 
estimator of S m p , i.e. a measurable function of the observations (% )o<y<np 
taking on values in T> m . One can use, for example, the LSE S mp for S mp , 
which is defined as the minimizer with respect to x G T> mp of the distance 




that is , the quantity 

7„, P (z) = - 2 - E A Vt k ( 4 0) 
fc=l 

and has the form 

^ 1 np 

jdm k=l 

Let the penalty term P n (m) be defined, as before, by (fl~3l) . Then the 
model selection procedure, corresponding to a family of projective estima- 
tors (S m p ) m&M , is defined as Sf^ p where 

m p = argmin mg _ Mp {7 np (S ; miP ) + P n {m)} . (42) 

In the case of the LSE family (S m p ) m£M , it will be Sf^ . 

As a measure of accuracy of the approximation of a 1-periodic function 
S of continuous argument t by its values on the (tj) 1< j <p , we will use the 
function 

H P (S) = -f^ h*(S) , h t (S) = ±- f (S(t) - S(t,))di. (43) 

The following theorem gives the oracle inequality for a general model selec- 
tion procedure based on the discrete time observations. 
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Theorem 5. Assume that the conditions Cj)-C|) hold. Then the estimator 
S~ _ satisfies the oracle inequality 

VsWS^ ~S\\l< inf a (S) + 8H P (S) + ^, (44) 

p r m£M p n 

where ^ 

a m,p(S) = 7E 5 \\S m>p — S ||p + 32\*z^ — . 

Now we obtain the oracle inequality (|44p for the least square model 
selection procedure p . To this end, we have to calculate the estimation 

accuracy of S m p for S mp , which is the projection of S on T> mp , i.e. 

1 P 

S m, P = Yl a hv $3 > a j, P = = - Yl S ^ ^^k) ■ 

jdm k=l 

First of all, we note that in this case 

p 



a j,p a j,p 



1 1 f n 

- ]T h(t k )h k (S) + - / iiP (*)d£ t , 

P fc=1 n 



where ^ p = X)fc=i 0j(*k)l(t ,*fc](^) - ^ n v * ew °f the condition C^), this 
implies that 



^ \fe=i / 

+ ^s[J o bAWt) <H P (S) + -. 

Corollary 3. Under the conditions Cj)-C|) ^ e LSE procedure Sfc >p sat- 
isfies the inequality 

Es ll4 p , P " Slip < mf 6 miP (5) + 8# P (S) + ^ , (45) 



where 



b mjP (S) = 7\\S m>p - S\\ 2 p + 7d m iI p (S) + A*(7 + 32z» Zj^f 
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Now we consider the estimation problem for the model ([I])-© on the 
basis of discrete data in the asymptotic setting. First, for any f}> 1, we set 

K np {S n ,P)= sup sup E SQ \\u n (S n - S)\\l, (46) 
See p , r QeV K 

where the set &g r is defined by ([2"7|) with the use of the trigonometric 

basis (f3T|) . uj n = uj n (/3) = n 2 ' 3 + 1 and the set V K is defined in ([29}) . As in 
Section 4, in order to minimize this risk, we apply the least square model 
selection procedure 5fn. ,p> constructed on the basis of the trigonometric 
system (|3T|) with the ordered selection, that is, M. p = {mi , . . . , m p } with 
rrij = {1, . . . ,j}. In this case d m . = j and l m . = 1 for 1 < j < p. 

It is shown in Appendix 16.7} that if p > n 1 / 2 , then for any e > 

lim n->oo SU P KnjiiSfiLj,, P) < oo (47) 
/3>l+e p 

and if p > n 1 / 2 , then for any /3 > 1 

li^^inf K^tS^P) >0. (48) 

It means that the adaptive estimator p with the p > n 1//2 ( in particular, 
one can take p = 2[n 1//2 ] + 1 ) is optimal in the sense of the risk (|46p . 



6 Appendix 

6.1 Properties of processes fl2J)— (j3J) 

We start with the result for process (|2]) which shows that both conditions 
C 1 )-C 2 ) and C*)-C*) are satisfied. 

Lemma 2. Let (£ t ) be defined by ([2]) with 9 < 0, / = (fj)j>\ be a family of 

linearly independent cadlag 1-periodic functions on R, hif) = In f( u )^u- 
Then the matrix 

V k , n (f) = Ki(/))i<y<fc (49) 

loitft elements v^Jf) = E I n (fi)I n (fj) is positive definite for each k > 1, 
n > 1 and 6* < 0. Moreover, if {fj)j>\ is orthonormal, then for any 8 < 

sup sup sup — z'V^ n (f)z < 2 . (50) 

fe>l n>l |z|=l n 
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Proof. Assume that for some n > 1, k > 1 and z = E M fc 

^%,n(/) 2 = °- Since 

z\ n (/)z=E^( fl ), 

where = £)* =1 Zjffo), one gets 4(5) = f* g(t)d£ t = a.s. Taking into 
account that the distribution of / n (g) for model (|2J) is equivalent to that of 
the random variable J™ g(t)dw t this implies that g(t) = for all t E [0, n]. 
Thus ^ = . . . = z k = and to we come the first assertion. Let us check 
(|50p . By applying Ito's formula one obtains 

/•n rn 

EI 2 n (g) = 20 g(t)EI t (g)£ t dt+ g 2 (t)dt, 
Jo Jo 

where 

rt 



m t (g)d t = l -j^ g{u)e e ^du. 



Therefore, 



/•n 

vi 2 n {g) = / 



e ev I g(t)g(t-v)dtdv + / g 2 (t)dt. (51) 



' v JO 

From here, it follows that for any 8 < 

i"n / poo 

z%, n U)z < J 9 2 {t)dt [1 + J e 0v dv 

„1 k 
<2n g 2 (t)dt = 2n ^ z 2 = 2n . 
J ° i=i 3 

This completes the proof of Lemma [2j □ 

Lemma 3. Let be defined by ([3]) with 8 E A, f = (fj)j>i be a family of 
linearly independent cadlag 1 -periodic functions on R. Then the matrix P9j) 
is positive definite for each k > 1, n > 1 and 6 & A. Moreover, if {fj)\<j<k 
is orthonormal, then for any < 5 < 1 and E 

sup sup sup - z'V k n (f)z < X*(5) , (52) 

k>l n>l \z\=l n 

where X*(5) is defined in (jSJ). 
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Proof. Let r\ t be process ([3]) with zero initial values, i.e. 

dvt 1] = \&ivt^ dt + dw * 

and t)q = ... = rj^ ^ = 0. Then £ t can be written as 

Z t =<e At Y> q+Vt , (53) 

where < X > i denotes the zth component of a vector X; A is the matrix 
defined in ([5]) and Y is a gaussian vector in M 9 independent of (%)i>o with 
zero mean and covariance matrix 

/>oo 

F= e Au D q e A ' u du, (54) 
Jo 

where D is q x q matrix in which all elements exept of the (1,1) element 
are equal to zero ant the (1,1) element is equal to 1. In view of (|53|) . one 
has 

'„(<?) = C + f n g(t)dv t , (55) 
where ( =< f™ g(t)Ae At dtY > q . Integration by parts yields 

\ g(t)d Vt = / G,_!(t)d^ 
Jo Jo 

where G (t) = g(t) and G^t) = /" G j _ 1 (n)du for 1 < j < q - 1. 

Now assume that for some n > 1, k > 1 and z = (z 1; . . . , £ fc )' € M. k 
s'Vfc.nGO* = °- Since 

Z \ n (/)z = E/^)=EC 2 + E Q n g(t)d Vt ^ , 

this implies that 

E (j[%(*)dife) a = E (jTc^Wd^y = 0. 

Taking into account that the distribution of the process (r^ 9 )o<t< n i n 
C[0,n] is equivalent to Wiener measure we have 



/ Gg-i(*)d^t = a.s. 
■A) 
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and therefore G q _i(t) = for all < t < n and hence g(-) = and we 
obtain z 1 = . . . = z k = 0. This leads to the first assertion. 
Let us show (|52p . By direct calculations we find 



< Ae Au FA' > M (J g(u + s) g(s)ds) du . 

where < A >^ • denotes the (i, j)-th element of matrix A. By applying the 
Bunyakovskii-Caushy-Schwartz inequality one gets 

rn r-n 

EI 2 n (g) < 2 / g 2 (s)ds / | < Ae Au FA' > m \du . 
J n Jo 



fj ( s ) ds = n ) 



Since 

/ g 2 (s)ds = n g 2 (s)ds = z 2 

we obtain the estimate 

1 f°° 

- z'V kin (f)z < 2 / | < Ae Au FA' > q , q \du < 2\A\ 2 \F\J{A) , 

where J (A) = \e Au \du. In order to come to ()52[) it remains to use the 
following inequality for matrix exponents of order q > 2 (see, for example, 
in Kabanov and Pergamenshchikov (2003) on p. 228) 

\e tB \ < e tA ^1 + 2\B\ ^ i(2i|B|)A , 

where A = max 1<J<(J ReA^, A^ are eigenvalues of the matrix B. 
Indeed, from (j54|) for any A G we find that 

\F\<F*(S) and J(4) < J* (<5) , 

where the functions F* (5) and J* (5) are defined in (jHJ) . Hence Lemma [3l 
□ 



6.2 Mean forecast inequality 

Lemma 4. (Galtchouk and Pergamenshchikov (2005)) 

Let a and £ be two positive random variables. Let (3 be a positive real 
number and {T x , x > 0} be a family of events such that, for any x, 

P(£>a + /3x,rj = 0. 
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Assume also that there exists some positive integrable on R + function M(x) 
dominating P(T£). Then 

E£ < Ea + pM* , 

where M* = / °° M(x)dx. 

Proof. We set 77 = (£ — a) + . Thus E£ < Ea + E77. Moreover, 

roo poo 

Er]= P(r]>z)dz= / P(£ > a + z)dz 
■A) -A) 

/'OO /»oo 

= p / P(f > a + /3x)dx < /3 / P(r£)dx < (3M* . 
Jo Jo 

□ 

6.3 Proof of Theorem Q] 

By making use of ([I]) and (jlOp we obtain 

||^-5|| 2 = 7n (z) + 2F n (z) + ||S|| 2 , (56) 

where F n (z) = n" 1 JJ 1 z(t) d£j. Further, from the definition ()14|) . it follows 
that for each m G Ai 

IniSfh) < Jn(S m ) + P n {m) - P n (m) . 

Thus 

\\S^ - S\\ 2 < \\S m -S\\ 2 + 2F n (Z)+m n (m,m), (57) 

where z = #~ — S m and w n (m, m) = P n (m) — P n {m). Now for each x > 0, 
< /i < 1/2A* and set t G .A/f we introduce the following gaussian function 
on P t +P m 

IF (z) 

U x , L {z, l i)= ||z||2 + ^ ^y , z€P t + P m , (58) 

where 



/ s , / c(u) N + d.l, + x . . , . 1 , ,„ „ . » . 
= 4 W w — with cOO = -- ln(l - 2A» 

and iV = dim(P t + V m ). 
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Moreover, let functions (f)^, ... ,(f> iN be the subset of {<pj)j>i which is a 
basis in V L + T> m . It should be noted that N < d L + d m . Then one can write 
a normalized vector z = z/\\z\\ for z ^ as 



N N 

* = Y1 a ^ij with a ) = 1 

3=1 3=1 



Therefore 



2n -l/2,n V 



Ml 2 + ^(^i") 



^ a , C 4j with Ci = ^ J o ^3 d & ■ 



By applying the Bunyakovskii-Cauchy-Schvartz inequality one gets 



where r\ i = J Y^j=i Q.- Now note that by the condition Ci) the vector 
• • • , d N ) is gaussian with zero mean and a non-generate covariance ma- 
trix B L . Therefore 

(27r)^/ 2 v / detS; Jrn 

where T L = (B^ 1 — 2^7jv) -1 and I N is the identity matrix of order N . One 
can easily verify that 



^ /^E^iC 2 /detT. 1 
Ee J=1 S — ■ ' 1- — 



V det ^ y/det(I N -2/xBj 
Thus, in view of the inequality 

det(Jjv - 2pBj > (1 - 2n\ max (B L )) N 
and the condition C2), we obtain 

Ee ' *j < (1 - 2/iA*)-^ 2 = , 



where c(/i) is defined in (j58j) . 

Now, by the Chebyshev inequality, for any b > and < /i < 1 /2A* , we 
obtain that 

P(r/ t > 6) < e^-^ . (60) 
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Choosing in this inequality 



1 c(p) N + d L l L + x 

b = b*( X ,l) = j^Q n}b (x,fJ,) — 

yields 

Pfa > K{x,i)) < e~ x - d ^. 
Now let T x = {sup^^ r? t /6*(x, l) < 1}. It is easy to see that 

P(r c (x)) < ]T P( Vl > b.(x,L)) < £ e-'"^ = l*e~ 
Thus, we obtain the following upper bound on the set r 



sup sup \U x Jz,n)\ < 1/4. 

t£M z&V L +V m 



which implies 



2F n (I) = U x>fh {z^){\\zf + gl~(x)) < - S\\ 2 + l\\S m - Sf 

4 4 
+ — (c{^)d m l m + (c(/i) + l)d^)H x. 

By making use of this inequality in (|57p we obtain, on the set T x , that 

~ 1 ~ 3 ~ 

1 1 5m - S\\ 2 < - \\Sfn - Sf + -\\S m - S\\ 2 + w n (m,m) 

4 4 
+ — (c(fi)d m l m + (c(m) + 1) dfhhi) + — x 

= \\\s^-sf + h\s m -s\\ 2 + n( P ,fi) + — x, 

2 2 nji 



i.e. 



\Sfn ~ S\\ 2 < 3\\S m - S\\ 2 + 20(p, n) + — x , 

n/i 



Qfp jj)=[p \ Ac ^ \ jnjm , 4c (/^) + 4 \ d ih l fh 



where 



[i ) n \ [i J n 

It is clear that to obtain a nonrandom minimal upper bound for this term 
we have to resolve the following optimazation problem 

p+ 4c(^)^ min gubjectto fgOf) + 4 _ p < Q , (6i) 
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One can check directly that the solution of this problem is ^ = (z^ — l)/2X*z^ 
and the optimal value for p is given in (|13p . Thus, by choosing these pa- 
rameters we have on the set 

~ ~ / <7 1 fir \* 

\\Sin -S\\ 2 < 3\\S m -S\\ 2 + 16A%^ + p A x . 

" nfo-l) 

Applying now Lemma H] with £ = \\Sfn — S\\ 2 , 



a = 3\\S m — S\\ + 16X* zj m d m /n , 



and M(x) = l*e~ x we obtain the inequality f 1 1 6 [) . Hence Theorem [TJ □ 

6.4 Some properties of the Fourier coefficients 

Lemma 5. Let S be a function in C k [0, 1] such that S®(Q) = 5^(1) /or a// 
< j ' < k and, such that, for some contants Lq > 0, L > and < a < 1 

max max \S U \x)\ < L and |S (fc) (x) - S {k \y)\ < L\x — y\ a (62) 

0<j<fc— 1 0<a;<l 

for all x ,y G [0, 1]. T/ien i/ie Fourier coefficients (afc)fc>o anc ^ (°fc)fc>i °/ ^ e 
function S, defined as 

oo 

<S(x) = — + y~](afc cos(27r fcr) + 6^ sin(27r foe)) , 
fc=i 

satisfy the following inequality 

( oo \ 1/2 

sup(n + l)' 3 f V (a 2 + b 2 ) I < c* (L + L ) , (63) 
^° \U ) 

where (3 = k + a (k being an integer and < a < 1 ) and 

8 L u 4 sin 4 (7r it) d« 

Proof of this Lemma is given in Fourdrinier and pergamenshchikov (2007) 
Appendix A4. 
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6.5 Lower bound for the parametric model 

We consider in this section the following model 

dy t = S{t, z)dt + dw t , (64) 

where (w t ) t>0 is a standard brownian motion; z £ R m is unknown vector 
parameter. Let now ir be a prior distribution density on M. m of the form 



l=i 



where 7T; is a positive density on the interval [— <^,5j] for some 5 l > 0. This 
means that the density ir has the following support 



T= [-6^8^ x ... x [S m ,8 m ]. 

We set 



Ci(z) = ^S(t,z)(dy t -S(t,z)dt). 



(65) 



Now we give some version of the van Trees inequality Gill and Levit (1995) 
for process ([61 



Lemma 6. For each I > 1, any estimator z t based on observations (%)o<t< r 
satisfies the following inequality 



(66) 



where E 5 denotes the expectation with respect to the distribution of process 



A x = I E Sz (f(z)ir(z)dz and B x = \ 1 du . 



Proof. It will be noted that the density of the distribution of process ([6 
with respect to the Wiener measure fi w on y = C[0, n] is defined as 

ffa z ) = e / n S,(t) dy t ~ I / n Sl(t) at _ 

Therefore, by applying the method from Gill and Levit (1995) we obtain the 
lower bound (|66p . Hence Lemma [6l □ 
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6.6 Proof of Theorem [5] 

To prove this theorem we adapt the proof of Theorem [T] for this case. In 
this case equality ([56]) becomes 

\\z-S\\l = ln yP {z) + 1F n>p {z) + 2G p (z,S) + \\Sf p , 

with 

np p 
71 k=l " k=l 

where the sequence h k (S) is defined in ([13]) . Similarly to the proof of The- 
orem [U one can show that 

~Es\\Sm p , P ~ S\\p < 3E s \\S mjP — S\\p + 16A*X m m 

+ 4B s \G p (z p ,S)\ + ^, (67) 

where z p = p — S mjP . Now we note that for any v > 

2G p (z,S) < uWm + ^HpiS) 

< 2v\\S fhp>p - Sf p + 2v\\Smj, ~ S\\ 2 p + v- 1 H P (S) . 

Therefore, taking into account the last inequality in (|67p . we obtain the 
following upper bound 

IIC oil 2 ^ ^ + 4z/ ,~ |2 | 16A*^ rfm^ m 

+ (1 - L)n + v(l-Av) Hp{S) ' 

By minimzing the last term with respect to v (i.e. maximazing v(l — 4^)) 
we find that v = 1/8. Thus the last inequality implies the upper bound 
Hence, Theorem [5] □ 
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6.7 Proof of (j4Tj) 

Consider first the principal term in (|45|> . Let (sj)j>i be the Fourier coeffi- 
cients for 5 in £2[0, 1] used in 0^ jT .. By setting Aj(t) = S — X^=i s i ( Pi(t) j 



one can estimate HS^ p — S\\ p as 



l 5 mi ,p-^llp= inf \\S a j( j)i\\l < WAjfp. 
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By the definition of qj (S) in (f28|) we obtain that 

\\^f v < 2 / Aj(t)dt + 2 ^ / (Ajfo)- A^dt 

fc= i ^tfe-j 

= 2 ?j+1 (5) + 2^T f k ( [*" AjMdu) dt. (68) 

Moreover, the Bunyakovskii-Cauchy-Schwartz inequality implies that 

[|A i ||;<2 S - +1 (5) +^[|A i f. 

Notice now that for the trigonometric basis (|3ip and for the functions S 
from r with /3 > 1 we obtain that for any j > 



DC 



2 -2 



i=j'+l i=j+l 

< n\j + l) 2 + 2vr 2 + 



^^^(j + ir 2 ^^. (69) 



Therefore for 1 < j < p 



r ( n j P 



see,,. 



sup \\S mj , p -Sf p <2^[l+-^^— i ). (70) 



Moreover, taking into account that H p (S) < p ||<S'|| , through ([69]) with 



J 



we get that for p > n 1 / 2 



sup flp(5) < 2~7~^ — iT ^ — TT • 

See„, r P 2 (/?-l) n(^-l) 

Thus (I45p implies that for any e > there exists some constant C* = 
C*(r, e) > such that for any f3 > 1 + e, p > n 1 / 2 and for 1 < j < p 

i 

This bound with j = = [n 2 ' 3+1 ] + 1 implies immediately inequality (|47p . 
□ 
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6.8 Proof of (gHD 

Notice now that for any estimator S n by putting T p (S n )(t) = X^j=i ^n(tj)^-(t. 
we can represent the accuracy of this estimator as 




Therefore, for any < e < 1 we can estimate with below this accuracy as 
||5 n -5||2>(l- e )||T p (S n )-5|| 2 -(e- 1 -l)^ / 3 (S^-S^fdt. 

3=1 Jt i-1 

Moreover, similarly to (|68|) — (j70 p we obtain that 
v rh 

/ (S(t) - Sit^fdt < p- 2 \\S\\ 2 < ri?n- x . 

j=l Jt i-l 

Therefore 

n n>p (s n ,p) > (i-e)Mn n (T n ,p) - (r 1 -!)™ 2 ^, 

n 

where the risk 1Z n (T n , f3) is defined by (f29|) for some estimator T n . Now 
Theorem H] directly implies (|48p . □ 
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